Synchronized stream packing

ABSTRACT

There are provided methods and apparatus for synchronized stream packing of packets that differ contextually between A/V streams in a parallel presentation. A method includes the step of identifying sub-picture/subtitle packets and/or audio packets having arrival timestamps and/or presentation timestamps that match an arrival timestamp and/or a presentation timestamp, respectively, of video packets. The method also includes the step of packing a Video Object Unit (VOBU) and/or a Transport Stream (TS) with the identified sub-picture/subtitle and audio packets and the video packets having the matching timestamps.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application Ser.No. 60/674,767, filed Apr. 26, 2006, which is incorporated by referenceherein in its entirety.

FIELD OF THE INVENTION

The present invention relates generally to Digital Versatile Discs,previously known as Digital Video Discs (DVDs), High Definition DigitalVersatile Discs (HD DVD), and Blu-Ray Disc (BD), and more particularlyto a technique for facilitating synchronization among the sub-streams ofdifferent audio/visual (A/V) streams embedded on a DVD, HD DVD, or BD.

BACKGROUND OF THE INVENTION

The DVD, HD DVD and Blu-ray specifications currently define mechanismsfor seamlessly switching among multiple parallel A/V streams. However,in each case, the audio and sub-picture content of the streams isrestricted to be bit-for-bit identical among all of the streams. Thisprevents any potential damage to audio speakers that could result fromsignal spikes caused by differences in the audio data from one A/Vstream to another, and also reduces the restrictions regardingorganization of such data within each multiplexed A/V stream. However,these restrictions also greatly limit the range of applications forwhich the seamless multi-angle feature may be used.

The development of the DVD followed the development of the Compact Disk(CD) in an effort to achieve sufficient storage capacity for large videofiles to enable a single disc to carry a full length motion picture,albeit compressed using a compression technique such as the MovingPicture Expert Group compression (MPEG) technique. Since its firstintroduction in the mid 1990s, the DVD has proliferated, becoming thepreferred medium of choice for wide scale distribution of motion pictureand video content to consumers. Similar optical disc formats fordelivery of higher quality and greater amounts of audiovisual contenthave been developed as planned successors to DVD. Two of the mostprominent formats are known as HD DVD and BD.

Present day DVDs, HD DVDs, and BDs typically include at least one, andusually several A/V streams in parallel synchronism to each other. Oftensuch A/V streams include different recordings of the same scene shotfrom a different angle. Hence, such different A/V streams are oftenreferred to as “angles”. Selection of different angles (i.e., differentstreams) occurs through a process known as “multi-angle navigation”whereby a viewer selects a desired angle by selecting an associated iconon a display screen. The DVD, HD DVD, and BD specifications adopted bythe manufacturers of these discs and associated playback devices definea process known as “multi-angle video” whereby a content author candefine as many as nine concurrent A/V streams, any one of which canappear on a display screen at any time. During playback, the viewer canswitch seamlessly among a set of synchronized A/V streams by actuating acommand via a button on a DVD, HD DVD, or BD player or on the remotecontrol device for such player; this form of multi-angle navigation isknown as seamless multi-angle. However, under known formatspecifications and implementations of currently available DVD, HD DVD,and BD authoring tools, audio and sub-picture data stored in each A/Vstream remains identical. That is, only different video data is allowedbetween angles. Sub-picture data describes the rendering of buttons,subtitles, and other graphical elements displayed over video. Thisresults both in an inability to automatically present different audioand sub-picture content when a parallel A/V stream is selected and alsoleads to redundant copies of audio and sub-picture data being stored onthe delivery medium, limiting space for other content.

A/V streams are constituted at a basic level of data packets for thesub-streams (audio, video, and sub-picture) which are joined together inshort units which, when read sequentially, comprise the presentedstream. In DVD-Video, these fundamental data units are known as VideoObject Units, or VOBUs, and each include about 0.4 to 1 second ofpresentation data. In HD DVD-Video, these are known as EVOBUs. The termsVOBUs and EVOBUs may be used interchangeably herein for illustrativepurposes. When multiple A/V streams are presented in parallel, eachstream collects one or more VOBUs into an Interleave Unit, or ILVU,which are synchronized with ILVUs for other parallel A/V streams basedon the video presentation time. Thus, when a new stream is selected, thedata from the current ILVU plays until the end of the ILVU and the ILVUfor the new stream is presented seamlessly at that time. In this way,seamless presentation of video is assured.

BD refers to a similar combination of packets using differentterminology, namely Transport Stream (TS). BD does not limit theduration of presentation data in the unit, using instead of ILVUs, anglechange points in each TS to mark points at which streams can be changedwhile ensuring video continuity.

Audio, video, and sub-picture packets in VOBUs, TS, RTP or otherpacketized multimedia formats are all typically marked with a first typeof timestamp indicating when they should be delivered for decoding and asecond type of timestamp indicating when they should be presented. Inthe case of VOBUs, the delivery timestamps are encoded in the“system_clock_reference” as defined in ISO/IEC13818-1. In the case ofTransport Streams (TSs), delivery timestamps are typically called“arrival_timestamps” as defined in some of the specifications derivedfrom ISO/IEC 13818-I. As used herein, the term “arrivaltimestamp”collectively refers to the delivery timestamp in VOBUs and TSs. Thepresentation timestamps are the usual PTSs as defined in ISO/IEC13818-I.

Due to different buffering models and decoder designs, non-video packetsin a single VOBU (or at an angle change point marker in a TS) may notall refer to similar presentation times. For example, an audio packetmay refer to presentation time 8, whereas a video packet may refer topresentation time 4, the audio packet for presentation time 4 havingbeen delivered from a previous VOBU. When audio and sub-picture/subtitledata are identical between VOBUs in ILVUs (or between TSs) for differentA/V streams in a parallel presentation, switching ILVUs or TSs has noeffect on audio, sub-picture/subtitle, and video synchronization orcorrespondence/synchronization. However, when audio and sub-picture datapackets differ between VOBUs or TSs for different A/V streams, a casecould occur where audio or sub-picture/subtitle packets corresponding tothe presentation time of the video from the new VOBU or TS have alreadybeen delivered from a previous VOBU or TS, resulting in audio orsub-picture/subtitle presentation that, while presented at the propertime, is out of correspondence/synchronization with the current context.

Thus, there exists a need for a method of storing data in a way thataudio and sub-picture data are contextually different in parallel,synchronized A/V streams playing from any one of these optical discformats and also maintain stream continuity as well as synchronizationwith video data as the viewer interactively selects different A/Vstreams during the presentation.

SUMMARY OF THE INVENTION

These and other drawbacks and disadvantages of the prior art areaddressed by the present invention, which is directed to synchronizedstream packing.

According to an aspect of the present invention, there is provided amethod for synchronized stream packing of packets that differcontextually between A/V streams in a parallel presentation. The methodincludes the step of identifying sub-picture/subtitle packets and/oraudio packets having arrival timestamps and/or presentation timestampsthat match an arrival timestamp and/or a presentation timestamp,respectively, of video packets. The method also includes the step ofpacking a Video Object Unit (VOBU) and/or a Transport Stream (TS) withthe identified sub-picture/subtitle and audio packets and the videopackets having the matching timestamps.

According to yet another aspect of the present invention, there isprovided an apparatus for synchronized stream packing of packets thatdiffer contextually between A/V streams in a parallel presentation. Theapparatus includes means for identifying sub-picture/subtitle packetsand/or audio packets having arrival timestamps and/or presentationtimestamps that match an arrival timestamp and/or a presentationtimestamp, respectively, of video packets. The apparatus also includesmeans for packing a Video Object Unit (VOBU) and/or a Transport Stream(TS) with the identified sub-picture/subtitle and audio packets and thevideo packets having the matching timestamps.

According to a further aspect of the present invention, there isprovided a method for presenting a different A/V stream from among aplurality of A/V streams that differ contextually in a parallelpresentation. The method includes the step of packing an audio frameheader into an audio packet at, a beginning of a first Video Object Unit(VOBU) in an InterLeaVe Unit (ILVU), or, an angle change point marker ina Transport Stream (TS). The method also includes the step of packing alast audio packet, in a last VOBU in the ILVU or another ILVU in a sameone of the plurality of A/V streams, or, immediately prior to anotherangle change point marker in the TS, so as to conclude with a completeaudio frame.

These and other aspects, features and advantages of the presentinvention will become apparent from the following detailed descriptionof exemplary embodiments, which is to be read in connection with theaccompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention may be better understood in accordance with thefollowing exemplary figures, in which:

FIG. 1 is a block diagram illustrating a DVD player to which the presentinvention may be applied, in accordance with an illustrative embodimentthereof;

FIG. 2 is a flow diagram illustrating a method for synchronized streampacking of packets that differ contextually between A/V streams in aparallel presentation in accordance with the present principles;

FIG. 3 is a flow diagram illustrating a method for synchronized streampacking of packets that differ contextually between A/V streams in aparallel presentation in accordance with the present principles;

FIG. 4 is a flow diagram illustrating a method for presenting adifferent A/V stream from among a plurality of A/V streams that differcontextually in a parallel presentation in accordance with the presentprinciples; and

FIG. 5 is a block diagram illustrating the relationship among anaudio/visual stream, Video Object Units (VOBUs) and an Interleave Units(ILVUs)

DETAILED DESCRIPTION

The present invention is directed to synchronized stream packing. Inaccordance with an embodiment, a method is provided for constraining theorganization of audio and sub-picture packets within multiplexed streams(e.g., MPEG program and transport streams) in order to allow seamlessswitching among multiple interleaved audio/video (A/V) presentations inwhich the audio content and/or sub-picture/subtitle content isdifferent.

The present description illustrates the principles of the presentinvention. It will thus be appreciated that those skilled in the artwill be able to devise various arrangements that, although notexplicitly described or shown herein, embody the principles of theinvention and are included within its spirit and scope.

All examples and conditional language recited herein are intended forpedagogical purposes to aid the reader in understanding the principlesof the invention and the concepts contributed by the inventor tofurthering the art, and are to be construed as being without limitationto such specifically recited examples and conditions.

Moreover, all statements herein reciting principles, aspects, andembodiments of the invention, as well as specific examples thereof, areintended to encompass both structural and functional equivalentsthereof. Additionally, it is intended that such equivalents include bothcurrently known equivalents as well as equivalents developed in thefuture, i.e., any elements developed that perform the same function,regardless of structure.

Thus, for example, it will be appreciated by those skilled in the artthat the block diagrams presented herein represent conceptual views ofillustrative circuitry embodying the principles of the invention.Similarly, it will be appreciated that any flow charts, flow diagrams,state transition diagrams, pseudocode, and the like represent variousprocesses which may be substantially represented in computer readablemedia and so executed by a computer or processor, whether or not suchcomputer or processor is explicitly shown.

The functions of the various elements shown in the figures may beprovided through the use of dedicated hardware as well as hardwarecapable of executing software in association with appropriate software.When provided by a processor, the functions may be provided by a singlededicated processor, by a single shared processor, or by a plurality ofindividual processors, some of which may be shared. Moreover, explicituse of the term “processor” or “controller” should not be construed torefer exclusively to hardware capable of executing software, and mayimplicitly include, without limitation, digital signal processor (“DSP”)hardware, read-only memory (“ROM”) for storing software, random accessmemory (“RAM”), and non-volatile storage.

Other hardware, conventional and/or custom, may also be included.Similarly, any switches shown in the figures are conceptual only. Theirfunction may be carried out through the operation of program logic,through dedicated logic, through the interaction of program control anddedicated logic, or even manually, the particular technique beingselectable by the implementer as more specifically understood from thecontext.

In the claims hereof, any element expressed as a means for performing aspecified function is intended to encompass any way of performing thatfunction including, for example, a) a combination of circuit elementsthat performs that function or b) software in any form, including,therefore, firmware, microcode or the like, combined with appropriatecircuitry for executing that software to perform the function. Theinvention as defined by such claims resides in the fact that thefunctionalities provided by the various recited means are combined andbrought together in the manner which the claims call for. It is thusregarded that any means that can provide those functionalities areequivalent to those shown herein.

Turning to FIG. 1, a Digital Versatile Disc (DVD) player 10 to which thepresent invention may be applied is indicated generally by the referencenumeral 10. The DVD player 10 includes a drive motor 12 that rotates aDVD 13 under the control of a servomechanism 14. A pick-up head motor16, also controlled by the servomechanism 14, serves to displace anoptical pick-up head 18 across the DVD 13 to read information carriedthereby. A pre-amplifier 20 amplifies the output signal of the pick-uphead 18 for input to a decoder 22 that decodes the optical informationread from the DVD 13 to yield a program stream. A de-multiplexer 24de-multiplexes the program stream into separate components: (a) an audiostream; (b) a video stream; (c) a sub-picture stream; and (d) navigationinformation, typically in the form of metadata or the like.

The audio, video, and sub-picture streams undergo decoding by a separateone of the audio decoder 26, video decoder 28 and sub-picture decoder30, respectively. A synchronizer 32, sometimes known as a presentationengine, serves to synchronize and combine the separately decoded audio,video and sub-picture streams into a video stream, with embedded audiofor suitable reproduction in accordance with one of several knowntelevision formats including, but not limited to NTSC or PAL. A videodigital-to-analog converter 34 converts the video stream into analogvideo for display on a display device (not shown) such as a televisionset, while an audio digital-to-analog-converter 36 converts the embeddedaudio to analog audio for subsequent reproduction by the display deviceor by other means (not shown).

Within the DVD player 10, a central processing unit (CPU) 38, typicallyin the form of a microprocessor with associated memory, or amicrocomputer or microcontroller, serves to control navigation, as wellas other aspects of the DVD player, in accordance with viewer commandsentered through a viewer interface (U/I) 40, typically comprising thecombination of an Infrared (I/R) transmitter, in the form of remotecontrol, and an I/R receiver. Specifically with regard to navigation,the CPU 38 receives decoded metadata from the demultiplexer 24 andgenerates menu information for receipt by the synchronizer 32. In thisway, the menu information ultimately undergoes display for viewing bythe viewer. In response to the displayed information, the viewertypically will enter one or more commands through the U/I 40 for receiptby the CPU 38, which in turn, controls the servomechanism 14 to displacethe pick-up head 18 to retrieve the desired program content.

The DVD specification (DVD Specifications for Read-Only Disc/Part 3.VIDEO SPECIFICATIONS, Version 1.0, August 1996), defines the smallestobject to which DVD navigation can apply as a Video Object Unit (VOBU).The VOBU typically includes multiplexed video, audio, sub-picture,highlight and other navigation data, corresponding to a playbackduration of about 0.4 to 1.2 seconds. Multiple sub-streams of audio andsub-picture data can exist in each VOBU (e.g., stereo and surround soundaudio sub-streams and/or German and Portuguese subtitles). Thiscombination of such multiplexed data constitutes an “A/V stream.” In amulti-angle segment, multiple A/V streams are interleaved together intoa single Video Object (VOB) stream in order to allow quick access fromone stream to another for seamless or near-seamless switching.

The DVD specification defines an Interleave Unit (ILVU) as a block ofone or more VOBUs in order to align the A/V stream content of multipleangles with a common time stamp, providing synchronization of the A/Vstreams. During playback, the synchronizer 32 decodes and displays onlythe ILVUs corresponding to the currently selected A/V stream. The DVDspecification defines a maximum size of the ILVU based on number ofangles (i.e., number of available streams), scan speed of the physicaldevice, and size of the decode buffer (not shown). If this maximum sizeis exceeded, seamless playback of any angle cannot be guaranteed.

In accordance with an embodiment, there is provided a method for thestorage of sub-picture/subtitle and/or audio data within at least one ofa plurality of audio-visual streams presented in parallel in order tomaintain synchronization between sub-picture/subtitle, audio, and videodata as well as provide continuity between such data as differentAudio/Visual (A/V) streams are selected during a presentation.

To ensure a constant synchronization and correspondence with video ofaudio and sub-picture/subtitle packets which differ contextually betweenA/V streams in a parallel presentation, Video Object Units (VOBUs) orTransport Streams (TSs) should include sub-picture/subtitle and audiopackets whose arrival timestamps match the arrival timestamp of thevideo packets (within one unit of time reference of thesub-picture/subtitle or audio packet, respectively). It is to beappreciated that sub-picture/subtitle typically have no innate framerate, instead their frame rate is usually somehow derived or related tothe video frame rate. The same rule applies to the presentationtimestamps, VOBUs or TSs should include sub-picture/subtitle and audiopackets whose presentation timestamps match the presentation timestampof the video packets (within one unit of time reference of thesub-picture/subtitle or audio packet, respectively). If VOBUs or TSs arepacked in this way, both synchronization and contextual correspondencebetween audio, sub-picture/subtitle, and video data is maintained whereaudio or sub-picture/subtitle data differs contextually between VOBUs orTSs for different A/V streams.

A further issue is the potential corruption of audio orsub-picture/subtitle data when an ILVU for a new A/V stream ispresented, as audio or sub-picture data packets at the beginning of thefirst VOBU in that ILVU (or at the angle change point marker of a TS)may be fragmented, and unable to be decoded until a subsequent, whole,packet occurs.

To resolve this issue, the audio data packet at the start of the firstVOBU in an ILVU (or at an angle change point marker of a TS) shouldinclude an audio frame header, and the last audio packet in the lastVOBU in an ILVU (or the last audio packet immediately prior to an anglechange point marker in a TS) should include a complete audio frame,i.e., no audio frame fragmentation should occur across any ILVU boundary(or across any angle change point marker). Similarlysub-picture/subtitle data must start with a Sub-Picture Unit (SPU)header or an Epoch start header.

Turning to FIG. 2, a method for synchronized stream packing of packetsthat differ contextually between A/V streams in a parallel presentationis indicated generally by the reference numeral 200.

The method 200 includes a start block 205 that passes control to afunction block 210. The function block 210 identifiessub-picture/subtitle packets and/or audio packets whose arrivaltimestamps match an arrival timestamp of the video packets, and passescontrol to a function block 220.

The function block 220 packs a Video Object Unit (VOBU) or a TransportStream (TS) with the identified sub-picture/subtitle and audio packetsand the video packets having the matching arrival timestamps, and passescontrol to an end block 225. The end block 225 terminates the method.

Turning to FIG. 3, a method for synchronized stream packing of packetsthat differ contextually between A/V streams in a parallel presentationis indicated generally by the reference numeral 300.

The method 300 includes a start block 305 that passes control to afunction block 310. The function block 310 identifiessub-picture/subtitle packets and/or audio packets whose presentationtimestamps match a presentation timestamp of the video packets, andpasses control to a function block 320. The function block 320 packs aVideo Object Unit (VOBU) or a Transport Stream (TS) with the identifiedsub-picture/subtitle and audio packets and the video packets having thematching presentation timestamps, and passes control to an end block325. The end block 325 terminates the method.

Turning to FIG. 4, a method for presenting a different A/V stream fromamong a plurality of A/V streams that differ contextually in a parallelpresentation is indicated generally by the reference numeral 400.

The method 400 includes a start block 405 that passes control to afunction block 410. The function block 410 packs an audio frame headerinto an audio packet at a beginning of a first Video Object Unit (VOBU)in an InterLeaVe Unit (ILVU), or packs an audio frame header into anaudio packet at an angle change point marker in a Transport Stream (TS),and passes control to a function block 420.

The function block 420 packs a last audio packet in a last VOBU in theILVU (or in another ILVU in the same A/V stream), or packs a last audiopacket immediately prior to another angle change point marker in the TS,so as to conclude with a complete audio frame (audio frame fragmentationis non-existent across any ILVU boundaries or angle change markers), andpasses control to a function block 430.

The function block 430 packs sub-picture/subtitle packets to start witha Sub-Picture Unit (SPU) header or an Epoch start header, and passescontrol to an end block 435. The end block 435 terminates the method.

Turning to FIG. 5, the relationship of multiplexed A/V stream data toVOBU and ILVU data structures for multi-angle video is indicatedgenerally by the reference numeral 500. As illustrated in FIG. 5, eachblock of the program stream decoded by the decoder 22 of FIG. 1 includesa navigation packet (NV_PCK), a video packet (V_PCK), an audio packet(A_PCK) and a sub-picture packet (SP_PCK). The DVD specification definesa Seamless Angle Information data structure (SML_AGLI) in the navigationdata structure (DSI) portion of the NV_PCK at the beginning of each VOBUthat includes a table of ILVU start points indicating the location wherethe next ILVU for each seamless angle is located. Such informationenables the CPU 38 of FIG. 1 to control the servomechanism 14 where togo within the VOB stream when it is ready to begin presenting the nextILVU.

In addition, the DVD specification defines several data structureswithin a portion of the navigation data at the beginning of each VOBUthat describe the Highlight Information (HLI) for interactive buttons.These data structures, such as the Highlight General Information(HLI_GI), Button Color Information Table (BTN_COLIT), and ButtonInformation Table (BTN_IT) define the number, position, appearance, andfunction of the buttons that appear in the screen display.

These and other features and advantages of the present invention may bereadily ascertained by one of ordinary skill in the pertinent art basedon the teachings herein. It is to be understood that the teachings ofthe present invention may be implemented in various forms of hardware,software, firmware, special purpose processors, or combinations thereof.

Most preferably, the teachings of the present invention are implementedas a combination of hardware and software. Moreover, the software ispreferably implemented as an application program tangibly embodied on aprogram storage unit. The application program may be uploaded to, andexecuted by, a machine comprising any suitable architecture. The variousprocesses and functions described herein may be either part of themicroinstruction code or part of the application program, or anycombination thereof, which may be executed by a CPU.

It is to be further understood that, because some of the constituentsystem components and methods depicted in the accompanying drawings arepreferably implemented in software, the actual connections between thesystem components or the process function blocks may differ dependingupon the manner in which the present invention is programmed. Given theteachings herein, one of ordinary skill in the pertinent art will beable to contemplate these and similar implementations or configurationsof the present invention.

Although the illustrative embodiments have been described herein withreference to the accompanying drawings, it is to be understood that thepresent invention is not limited to those precise embodiments, and thatvarious changes and modifications may be effected therein by one ofordinary skill in the pertinent art without departing from the scope orspirit of the present invention. All such changes and modifications areintended to be included within the scope of the present invention as setforth in the appended claims.

1-10. (canceled)
 11. A method, comprising: packing an audio frame headerinto an audio packet at an angle change point in a Transport Stream(TS); packing a last audio packet immediately prior to another anglechange point in the TS, so as to conclude with a complete audio frame;and packing a subtitle packet to start with an Epoch start header.